## -*- mode: html; coding: utf-8; -*-

## This file is part of CDS Invenio.
## Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008 CERN.
##
## CDS Invenio is free software; you can redistribute it and/or
## modify it under the terms of the GNU General Public License as
## published by the Free Software Foundation; either version 2 of the
## License, or (at your option) any later version.
##
## CDS Invenio is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with CDS Invenio; if not, write to the Free Software Foundation, Inc.,
## 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

<!-- WebDoc-Page-Title: Coding Style -->
<!-- WebDoc-Page-Navtrail: <a class="navtrail" href="<CFG_SITE_URL>/help/hacking">Hacking CDS Invenio</a> -->
<!-- WebDoc-Page-Revision: $Id$ -->

<p>A brief description of things we strive at, more or less unsuccessfully.</p>

<h2>1. Packaging</h2>

<p>We use the classical GNU Autoconf/Automake approach, for tutorial
see
e.g. <a href="http://www.amath.washington.edu/~lf/tutorials/autoconf/tutorial_toc.html">Learning
the GNU development tools</a> or
the <a href="http://sources.redhat.com/autobook/autobook/autobook_toc.html">AutoBook</a>.</p>

<h2>2. Modules</h2>

<p>CDS Invenio started as a set of pretty independent modules
developed by independent people with independent styles.  This was
even more pronounced by the original use of many different languages
(e.g. Python, PHP, Perl).  Now the CDS Invenio code base is striving
to use Python everywhere, except in speed-critical parts when a
compiled language such as Common Lisp may come to the rescue in the
near future.</p>

<p>When modifying an existing module, we propose to strictly continue
using whatever coding style the module was originally written into.
When writing new modules, we propose to stick to the below-mentioned
standards.</p>

<p>The code integration across modules is happening, but is slow.
Therefore, don't be surprised to see that there is a lot of room to
refactor.</p>

<h2>3. Python</h2>

<p>We aim at following recommendations
from <a href="http://www.python.org/peps/pep-0008.html">PEP 8</a>,
although the existing code surely do not fulfil them here and there.
The code indentation is done via spaces only, please do not use tabs.
One tab counts as four spaces.  Emacs users can look into
our <a href="<CFG_SITE_URL>/hacking/cdsware.el">cdsware.el</a> and
our <a href="https://twiki.cern.ch/twiki/bin/view/CDS/EmacsTips">Emacs
Tips wiki page</a> for inspiration.</p>

<p>All the Python code should be extensively documented via
docstrings, so you can always run pydoc file.py to peruse the file's
documentation in one simple go.  We follow
the <a href="http://epydoc.sourceforge.net/manual-epytext.html">epytext</a>
docstring markup, which generates nice
HTML <a href="http://cdsware.cern.ch/invenio/code-browser/">source
code documentation.</a></p>

<p>Do not forget to run pylint on your code to check for errors like
uninitialized variables and to improve its quality and conformance to
the coding standard.  If you develop in Emacs, run M-x pylint RET on
your buffers frequently.  Read and implement pylint suggestions.
(Note that using lambda and friends may lead to false pylint warnings.
You can switch them off by putting block comments of the form ``#
pylint: disable-msg=C0301''.)</p>

<p>Do not forget to run pychecker on your code either.  It is another
source code checker that catches some situations better and some
situations worse than pylint.  If you develop in Emacs, run C-c C-w
(M-x py-pychecker-run RET) on your buffers frequently.  (Note that
using psyco on classes may lead to false pychecker warnings.)</p>

<p>You can check the kwalitee of your code by running ``python
modules/miscutil/lib/kwalitee.py --check-all *.py'' on your files.
This will run some basic error checking, warning checking, indentation
checking, but also compliance to PEP 8.  You can also check the code
kwalitee stats across all the modules by running ``make
kwalitee-check'' in the main source directory.</p>

<p>Do not hardcode magic constants in your code.  Every magic string
or a number should be put into accompanying file_config.py with symbol
name beginning by cfg_modulename_*.</p>

<p>Clearly separate interfaces from implementation.  Document your
interfaces.  Do not expose to other modules anything that does not
have to be exposed.  Apply principle of least information.</p>

<p>Create as few new library files as possible.  Do not create many
nested files in nested modules; rather put all the lib files in one
dir with bibindex_foo and bibindex_bar names.</p>

<p>Use imperative/functional paradigm rather then OO.  If you do use
OO, then stick to as simple class hierarchy as possible.  Recall that
method calls and exception handling in Python are quite expensive.</p>

<p>Use rather the good old foo_bar naming convention for symbols (both
variables and function names) instead of fooBar CaMelCaSe convention.
(Except for Class names where UppercaseSymbolNames are to be
used.)</p>

<p>Pay special attention to name your symbols descriptively.  Your
code is going to be read and work with by others and its symbols
should be self-understandable without any comments and without
studying other parts of the code.  For example, use proper English
words, not abbreviations that can be misspelled in many a way; use
words that go in pair (e.g. create/destroy, start/stop; never
create/stop); use self-understandable symbol names
(e.g. list_of_file_extensions rather than list2); never misname
symbols (e.g. score_list should hold the list of scores and nothing
else - if in the course of development you change the semantics of
what the symbol holds then change the symbol name too).  Do not be
afraid to use long descriptive names; good editors such as Emacs can
tab-complete symbols for you.</p>

<p>When hacking module A, pay close attention to ressemble existing
coding convention in A, even if it is legacy-weird and even if we use
a different technique elsewhere.  (Unless the whole module A is going
to be refactored, of course.)</p>

<p>Speed-critical parts should be profiled with pyprof or our built-in
web profiler (<code>&profile=t</code>).</p>

<p>The code should be well tested before committed.  Testing is an
integral part of the development process.  Test along as you
program.  The testing process should be automatized via our unit
test and regression test suite infrastructures.  Please read the
<a href="test-suite">test suite strategy</a> to know more.</p>

<p>Python promotes writing clear, readable, easily maintainable code.
Write it as such.  Recall Albert Einstein's ``Everything should be
made as simple as possible, but not simpler''.  Things should be
neither overengineered nor oversimplified.</p>

<p>Recall principles Unix is built upon.  As summarized by Eric
S. Reymond's <a href="http://www.catb.org/esr/writings/taoup/html/ch01s06.html#id2877537">TAOUP</a>:

<ul>
  <li>Rule of Modularity: Write simple parts connected by clean interfaces.
  <li>Rule of Clarity: Clarity is better than cleverness.
  <li>Rule of Composition: Design programs to be connected with other programs.
  <li>Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
  <li>Rule of Simplicity: Design for simplicity; add complexity only where you must.
  <li>Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
  <li>Rule of Transparency: Design for visibility to make inspection and debugging easier.
  <li>Rule of Robustness: Robustness is the child of transparency and simplicity.
  <li>Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
  <li>Rule of Least Surprise: In interface design, always do the least surprising thing.
  <li>Rule of Silence: When a program has nothing surprising to say, it should say nothing.
  <li>Rule of Repair: Repair what you can -- but when you must fail, fail noisily and as soon as possible.
  <li>Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
  <li>Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
  <li>Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
  <li>Rule of Diversity: Distrust all claims for one true way.
  <li>Rule of Extensibility: Design for the future, because it will be here sooner than you think.
</ul>

   or the golden rule that says it all: ``keep it simple''.</p>

<p>Think of security and robustness from the start.
Follow <a href="http://security.web.cern.ch/security/SecureSoftware/checklist.htm">secure
programming guidelines</a>.</p>

<p>For more hints, thoughts, and other ruminations on programming, see
our <a href="https://twiki.cern.ch/twiki/bin/view/CDS/Invenio">CDS
Invenio wiki</a>,
notably <a href="https://twiki.cern.ch/twiki/bin/view/CDS/GitWorkflow">Git
Workflow</a>
and <a href="https://twiki.cern.ch/twiki/bin/view/CDS/InvenioQualityAssurance">Invenio
QA</a>.</p>

<h2>3. MySQL</h2>

<p>Table naming policy is, roughly and briefly:</p>

<ul>
  <li>"foo": table names in lowercase, without prefix, used by me for
         WebSearch

  <li>"foo_bar": underscores represent M:N relationship between "foo"
        and "bar", to tie the two tables together

  <li>"bib*": many tables to hold the metadata and relationships
         between them

  <li>"idx*": idx is the table name prefix used by BibIndex

  <li>"rnk*": rnk is the table name prefix used by BibRank

  <li>"fmt*": fmt is the table name prefix used by BibFormat

  <li>"sbm*": sbm is the table name prefix used by WebSubmit

  <li>"sch*": sch is the table name prefix used by BibSched

  <li>"acc*": acc is the table name prefix used by WebAccess

  <li>"bsk*": acc is the table name prefix used by WebBasket

  <li>"msg*": acc is the table name prefix used by WebMessage

  <li>"cls*": acc is the table name prefix used by BibClassify

  <li>"sta*": acc is the table name prefix used by WebStat

  <li>"jrn*": acc is the table name prefix used by WebJournal

  <li>"collection*": many tables to describe collections and search
        interface pages

  <li>"user*" : many tables to describe personal features (baskets,
        alerts)

  <li>"hst*": tables related to historical versions of metadata and
        fulltext files
</ul>

- end of file -
